We are IntechOpen, 
the world’s leading publisher of 


Open Access books 
Built by scientists, for scientists 


9,900 130,000 150M 


ailable International authors and editors Downloads 


Our author among the 


104 TOP 1% 12.2% 


Countries delivered to most cited s Contributors from top 500 universities 


Selection of our books indexed in the Book Citation Index 
in Web of Science™ Core Collection (BKCI) 


Interested in publishing with us? 
Contact book.department@intechopen.com 


Numbers displayed above are based on latest data collected. 
F information visit www.intechopen.com 


ay 


Chapter 11 


Biomarkers in Rare Genetic Diseases 


Chiara Scotton and Alessandra Ferlini 


Additional information is available at the end of the chapter 


http://dx.doi.org/10.5772/63354 


Abstract 


Biomarkers offer a way to speed up medical research by shedding light on the physiopa- 
thological mechanisms of disease. Furthermore, biomarkers are considered invaluable 
tools for monitoring disease progression, prognosis, and response to drugs, especially in 
clinical trials, where they can be used to assess the efficacy, efficiency, and side effects of 
novel drugs. 


Biomarkers also pave the way to personalised medicine, a rapidly developing field that 
is of particular interest in rare diseases (RDs), i.e. those with a prevalence of less than 
5/10,000, which are often genetic in origin. Although rare genetic diseases may be less 
appealing targets for pharmaceutical companies, they are nevertheless in urgent need of 
research into their diagnosis, prevention, treatment, and standards of care. 


Here we summarise the state of the art in RDs, genetic diagnosis, and novel strategies 
aimed at accurately identifying and defining gene mutations, and review the evidence 
emerging from the latest research and clinical trials. We focus in particular on novel 
biomarkers, describing the different types discovered so far, highlighting their 
importance and indicating how they may be translated into research, diagnostics, 
treatment, and preventative applications in personalised strategies for RDs. 


Keywords: biomarker, rare disease, genetic disease, genomics, transcriptomics, pro- 
teomics 


1. Introduction 


As each rare disease (RD) only affects a relatively small number of individuals across the globe, 
there are often great obstacles to their research, diagnosis, treatment, and prevention. In 
Europe, a disease is considered to be rare, or orphan, when it affects fewer than 5 people in 
10,000, in line with the definitions adopted by the European Committee (EC) in their Orphan 
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Drugs Regulation N° 141/2000 and Commission Communication COM (2008) 679/2 on RDs: 
Europe's challenges [1]. However, RDs are often chronic, progressive, degenerative, life- 
threatening and/or severely disabling in terms of a patient's quality of life, frequently leading 
to a lack or loss of autonomy. 


Although infections, allergies, and environmental factors, linked in particular to degenerative 
and proliferative processes (i.e. auto-immunity or cancer), may be implicated in the onset of 
RDs, the vast majority, approximately 80%, is caused by genetic defects (though not all RDs 
are genetic diseases). The signs and symptoms of many RDs may therefore be observed at birth 
or during childhood, and, indeed, roughly 75% of RDs affect children, including chondrodys- 
plasia, neurofibromatosis, osteogenesis imperfecta, proximal spinal muscular atrophy and 
Rett syndrome. The RDs that manifest in adulthood, on the contrary, include amyotrophic 
lateral sclerosis (ALS), and Charcot-Marie-Tooth, Crohn, and Huntington diseases. 


Although each individual condition may fit the definition of rare, about 7000 distinct RDs have 
been identified so far, affecting 6-8% of the global population. In fact, it is estimated that 350 
million people worldwide suffer from even rarer conditions, suggesting that one in 20 patients 
will be affected by an orphan disease [2]. Therefore, collectively, RDs are not at all rare, and 
as a whole, they generate a considerable socioeconomic burden. 


In addition to there being a wide spectrum of RDs, they are also characterised by great 
variability in the age of onset, signs and symptoms, and patterns of tissue/organ involvement. 
To further complicate the issue, molecular testing and phenotype analysis reveal that muta- 
tions occurring in the same gene can be associated with different clinical diagnoses, and 
marked intra- and interfamilial phenotype variability has been documented. RDs are therefore 
often extremely difficult to diagnose, and only about 4000 genes have been identified for the 
7000 RDs described in the OMIM database [3]. Understandably, therefore, the IRDiRC [4] has 
set its members the challenge of diagnosing most, if not all, RDs by 2020, and discovering at 
least 200 new therapeutic options for their patients. 


Nevertheless, without early diagnosis and effective treatment strategies, it is impossible to 
guarantee any improvement in the quality of life and/or life expectancy of such patients. 
Furthermore, our lack of knowledge regarding the causes, physiopathological mechanisms, 
and clinical progression of RDs makes it difficult to apply available treatments and to develop 
novel therapeutic strategies. In addition, the small number of patients complicates the 
recruitment of an adequate sample for clinical trials, especially in children, which make up an 
even smaller percentage of the overall RD population. This is an obvious deterrent to the 
pharmaceutical industry, which has only limited interest in developing and marketing 
products for this small consumer base. In order to counter some of these problems, both 
national and the EU governments have made orphan drug laws and funding a priority, but, 
despite this recent interest, treatment options are currently only available for 5% of RDs [2]. 


It is not only RDs that could benefit from more activity in this area, as RD research is also 
considered pivotal for many common diseases, and has in some cases revealed mechanisms 
and pathways that have been subsequently associated with other rare or common diseases [5]. 
Indeed, several RDs have been linked to a high degree of genetic and phenotypic heterogeneity; 
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for example, mutations occurring in the LMNA gene can cause different disease types by 
affecting different tissues, such as (i) striated muscle (muscular dystrophy such as Emery- 
Dreifuss muscular dystrophy and limb-girdle muscular dystrophy or dilated cardiomyop- 
athy), (ii) adipose tissue (lipodystrophy syndromes), (iii) peripheral nerve (peripheral 
neuropathy such as Charcot-Marie-Tooth disorder) or (iv) accelerated ageing (progeria 
diseases). There are also clinical signs that can be associated with both genetic and acquired 
disease. For instance, renal cell carcinoma is characterised by the dysregulation of metabolic 
pathways (oxygen, iron, and nutrient sensing) which are also manifestations of rare hereditary 
syndromes such as Von Hippel-Lindau (VHL, OMIM 193300) and Birt-Hogg-Dubé (BHD, 
OMIM 135150) syndromes, as well as hereditary leiomyomatosis and renal cell carcinoma 
(HLRCC, OMIM 150800) [5]. It is therefore essential for the research being carried out world- 
wide to focus on identifying characteristic determinants able to discriminate between specific 
disease states, stages, and probabilities of responding to particular treatments— put simply, 
biomarkers. 


2. Biomarker: definition and utility 


Biomarkers were first described and defined in 2001 by two different review papers [6, 7], both 
of which suggested that they would be the key to understanding the physiopathology of 
disease and discovering novel treatment strategies. The classic definition of a biomarker is ‘a 
characteristic that is objectively measured and evaluated as an indicator of normal biological 
processes, pathogenic processes, or pharmacological responses to a therapeutic intervention’. 
In other words, biomarkers are ‘measurables’ that rely on tools and technologies for assessing 
body fluids or tissue (blood, urine, cell, skin, etc.), such as DNA analysis [point variants, copy 
number variation (CNV), translocations, methylation analysis], RNA analysis [expression 
profile and microRNA (miRNA) characterisation], protein analysis (quantification of circulat- 
ing proteins), and imaging technologies, or other means of physiological measurement [8]. 
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Figure 1. Literature survey. Number of citations in PubMed in which the keyword ‘biomarker’ is present in ‘title’ 
and/or ‘abstract’ from 2001 to 2015. 
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From their definition, the number of published papers related to biomarker discovery has 
increased more than 20-fold (Figure 1), and the discovery and development of novel biomark- 
ers have kept pace with technical advances, in particular the advent of high-throughput 
analysis technologies. Moreover, the large number of grant projects set up over the last 5 years 
to fund biomarker research, including BIO-NMD [9] and NeurOmics [10], has begun to yield 
considerable fruits in this field. 


The BIO-NMD project is a Europe-wide research network whose aim is to identify and validate 
biomarkers for rare neuromuscular diseases, such as dystrophinopathies (Becker muscular 
dystrophy, OMIM 300376; Duchenne muscular dystrophy OMIM 310200; dilated cardiomy- 
opathy, OMIM 302045) and COL6-related myopathies (Bethlem myopathy, OMIM 158810; 
Ullrich congenital muscular dystrophy, OMIM 254090). Funded by the EU (2009-2012), BIO- 
NMD set out to investigate different human tissues/cells/fluids using multiple -omic strategies 
(genomics, transcriptomics, and proteomics), an approach that led to the identification of 
several biomarkers. Thanks to this project, both plasma and tissue biomarkers that will be 
useful for monitoring disease progression, prognosis, and treatment response have been 
described and will ultimately help to pinpoint appropriate options for personalised treatment 
[11,12]. 


In a similar vein, the EU's NeurOmics project is still ongoing and aims to revolutionise 
diagnostics and develop new treatments for 10 major neuromuscular and neurodegenerative 
diseases by using sophisticated -omics technologies. To do this, it has brought together leading 
European research groups, five highly innovative SMEs, and experts from outside the EU, who 
are all working to identify genes and develop biomarkers for clinical application, as well as to 
identify drug targets and improve understanding of the physiopathology of the diseases in 
question. 


This research activity has been largely prompted by the versatility of biomarkers. Indeed, the 
classical view of biomarkers as a clinical end-point, an objective snapshot that reflects how a 
patient feels, functions or survives, is extremely reductive. In addition to numerous applica- 
tions in clinical settings, biomarkers may also serve as a surrogate end-point, a predictor of 
clinical benefit (or lack thereof) based on epidemiological, therapeutic, physiopathological, or 
other scientific evidence [13]. In other words, a biomarker may act as a clinically meaningful 
end-point in clinical trials. Such surrogate end-point biomarkers are foreseeably of particular 
benefit in RDs, in which a high percentage of diseases without a genetic cause, slow disease 
progression, chronic nature of the diseases, high heterogeneity of signs and symptoms within 
the same phenotype, and the difficulty in objectively measuring any change in symptoms 
dramatically increase the expense of clinical trials. Not only the cost but also the difficulty in 
undertaking trials based on conventional end-points severely curtails their number, and the 
lack of sensitive, specific, and timely outcome measures hinders the discovery and develop- 
ment of novel treatments. 


However, a biomarker can lessen the burden of the clinical trial process by providing infor- 
mation about the safety and efficacy of treatments before the collection of definitive clinical 
data, which provides the opportunity for mid-course re-appraisal, and even interruption if the 
intervention being investigated is revealed as potentially harmful to participants [14]. Indeed, 
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biomarkers are far superior to subjective measurements, which may not be directly associated 
to a disease characteristic, or able to detect small changes, especially in the short term. 


Biomarkers, on the other hand, can provide an objective measurement of aspects precisely 
correlated to a specific disease condition, potentially enabling small changes in status to be 


identified, the disease progression to be assessed, and the likely effects of the therapeutic 
intervention to be predicted [1] while the trial is ongoing, as well as in real-world settings. 


Considering the versatility of biomarkers, the European Medicines Agency (EMA) has 
attempted to standardise them by drawing up a list of the features of an ‘ideal’ biomarker, 


namely [15]: 


Analytical validity. Like a fingerprint, a biomarker should enable measurement, within a 
specific range, of a parameter able to accurately and clearly distinguish between altered/ 
normal status or treatment response/non-response. The test(s) used to detect a biomarker 
should be accurate, reliable, and reproducible, and their technological limits clearly defined. 
As the analytical accuracy depends on laboratory procedures, such as sample preparation 
and technology application, these should be reported in order to ensure reproducibility of 
biomarker discovery and validation. 


Clinical validity. Like a mirror, a biomarker should accurately reflect the features of a disease 
(or treatment), detecting even small changes, and not be influenced by circumstantial factors 
such as diet, exercise, stress, age, sex, or the environment, i.e. an alteration in the disease 
features should always be reflected by the biomarker, and a difference in the biomarker 
should always reflect a change in the disease. In other words, an ideal biomarker will 
identify specific disease parameters and be sensitive to any change in them. Likewise, to be 
clinically valid, a biomarker must display a high degree of accuracy (indicating correctly 
whether a patient has or does not have the disease or treatment effects in the vast majority 
of cases). 


Clinical utility. Like a prophet, a biomarker should herald the outcome of a given situation/ 
intervention. In other words, biomarkers should predict the members of a population who 
will develop a disease, manifest a disease progression, or respond to a specific treatment. 
The clinical utility of a biomarker in an appropriate population can be measured by two 
predictive values, the PPV and NPV, which are respectively used to quantify the probability 
that a person with a positive test for that biomarker will manifest the outcome predicted by 
the test, and the probability that a person with negative test will not respond to the inter- 
vention/treatment. 


Non-invasiveness. Like an open door, a biomarker should grant accessibility, i.e. enable an 
early, sensitive measurement, of disease severity etc., via the simple collection of body fluids 
(urine/blood) or scanned images (e.g. MRI or PET, etc.). This will allow a disorder to be 
monitored at different time points, without recourse to invasive procedures such as biopsies 
or tissue analysis. 


Feasibility. Like the passage of time, a biomarker should be practical to identify and 
measure, as well as invariable, irrespective of the type of sample collection, processing 
procedures, or methods used in its detection. 
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e Time and cost-effectiveness. Like a moneybox, a biomarker should be quick and easy to use 
and not be so expensive and time-consuming to measure that it cannot be used as a surrogate 
endpoint in clinical trials or to aid diagnostics and disease monitoring. 


Biomarkers that possess all these features will inevitably lead to improvements in clinical trials, 
especially in the field of personalised medicine. Personalised medicine shifts the current ‘one- 
size-fits-all’ approach to a more individual line of attack or defence, centred on giving ‘the 
right drug to the right patients at the right time’ [16]. This is particularly crucial in RDs, in 
which successful treatment development is generally hindered by the small number of patients 
and short runs that characterise trials for novel interventions. 


3. Strategies for biomarker discovery 


In recent years, novel techniques and strategies have emerged for biomarker discovery, and 
there are currently two major approaches being applied: 


° Candidate approach. This is a hypothesis-driven method based on knowledge of the 
relevant physiopathological processes, disease pathway(s), or key molecule(s). It analyses 
the known gene/protein and their linking products in order to discover a qualitative or 
quantitative variation in diseased samples (fluids, cells, tissues) with respect to normal ones. 


° High-throughput approach. This is a hypothesis-free strategy that takes advantage of the 
development of novel techniques for generating very large amounts of data to compare 
pathological and normal status. This "big-data’ approach is extremely powerful, although 
it is cost-intensive and requires significant time for validation and clinical definition of the 
biomarkers identified. 


3.1. Discovery of genetic variations 


Next-generation sequencing (NGS) techniques are based on high-throughput genomic and 
transcriptomic sequencing. In brief, target regions can be isolated from the entire genome by 
hybridisation to complementary sequences. This ‘capturing’ is performed on demand, to 
isolate sequences that may consist of protein-coding regions only (whole exome sequencing), 
a specifically targeted gene region (focusing on a limited number of known genes), or the entire 
genome (whole-genome sequencing). The captured region can then be sequenced by one of 
several methods (pyrosequencing, 454 Roche; sequencing by reversible termination, Illumina; 
sequencing by ligation, Solid; semiconductor sequencing, Ion Torrent), and the resulting 
output is composed of several sequence reads, which are then computationally aligned to the 
known genome in order to unravel any variations, such as small insertions or deletions [17]. 
Unlike traditional Sanger sequencing, which reads a sequence base by base, NGS is very time- 
efficient, enabling the simultaneous analysis of millions of base pairs organised in multiple 
aligned reads. Despite its efficiency, however, NGS is unable to detect dynamic mutations (e.g. 
triplet expansions) and still has limited capability to identify CNVs. Nevertheless, while we 
await the development of specific algorithms to overcome these limitations, NGS can be 
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integrated with DNA profiling tools, such as array-CGH, for the detection of CNVs and other 
genetic imbalances. 


The methylation profile of genes can also be explored via epigenomics. In fact, the recent 
advent of methylomic profiling now allows us to determine the DNA methylation status of 
the entire genome, and thereby to identify an increasing number of genes that are methylated 
in disease states, particularly cancer [18]. 


3.2. Discovery of RNA variations 


Complementary genome-wide information technologies can be used to identify qualitative 
and quantitative variations at the RNA level. For example, a gene expression microarray or 
high-throughput technology such as RNA sequencing (RNAseq) can be used to perform 
transcriptome analysis. Transcriptome profiling can be performed on samples from biopsy or 
cell cultures from specific affected tissues, or, less invasively, from different body fluids such 
as urine, blood, or saliva [1]. The technique enables the generation of enriched RNA/cDNA 
libraries that cover the entire transcribed region, or, alternatively, a catalogue of genes of 
interest that can be used to evaluate gene expression or identify novel transcripts, alternative 
splicing, and/or gene fusion products. 


Although transcript sequencing is heavily influenced by the tissue/cell type analysed, tran- 
scription and RNA editing being profoundly tissue specific, it is highly versatile. Indeed, in 
addition to mRNAs, transcriptomics can be extended to non-coding RNAs such as miRNA— 
single-strand sequences of 18-25 nucleotides regulating the expression of target genes already 
known for their role as biomarkers. 


Gene expression profiling is also considered a very powerful method of identifying biomarkers 
of pathological status, disease progression, and/or drug response, with the advantage of 
exploring specific tissue behaviour [19]. Microarray technologies may be used to quantify and 
compare the DNA levels/configurations of many transcripts in diseased and healthy samples, 
or at different time points (e.g. pre- and post-treatment). 


3.3. Discovery of protein biomarkers 


The evolution of mass spectrometry (MS)-based technologies and the development of other 
proteomic strategies such as two-dimensional gel electrophoresis (2D-DIGE) have considera- 
bly advanced our understanding of the nature of the proteome. This can be analysed to explore 
specific cellular functions and the control of specific biological processes, although the 
complexity and size of the human proteome pose larger challenges than those encountered in 
genomic and transcriptomic research [20]. Indeed, the individual proteome can change 
markedly over the course of a lifetime, and a single gene often produces very different 
isoforms, by alternative splicing or post-translational modifications such as phosphorylation, 
glycosylation, acetylation, and ubiquitination. However, proteins are often a target for 
pharmacological intervention, and proteomic technologies able to evaluate the expression 
level of soluble proteins are emerging, thereby paving the way to the discovery and validation 
of protein biomarkers. 
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The most common novel high-throughput approaches currently being used in discovery 
proteomics are those based on MS. These technologies enable the analysis of complex mixtures 
of proteins, measuring the mass-to-charge ratio of charged particles in order to determine their 
mass, quantity, and elemental composition. There are essentially two different types of MS 
approaches, namely top-down experiments, which analyse the whole protein, and bottom-up, 
which analyse proteins previously digested by proteases. For characterisation purposes, the 
resulting peptide mixtures may then be separated using different strategies, such as liquid 
chromatography (LC), gas chromatography, or ion mobility spectrometry, and then the 
identified proteins can be quantified. To achieve this, samples can be isotopically labelled by 
different methods, such as stable isotope labelling by amino acids (SILAC), isotype-coded 
affinity tagging (ICAT), isobaric tags for relative and absolute quantification ((TRAQ), and 
mass tags for relative and absolute quantification (mTRAQ) [21]. A typical MS protocol would 
therefore consist of sample loading (of intact or digested protein), vaporisation, ionisation, and 
separation of the ionised sample by mass-to-charge ratio, detection in an MS instrument, and 
generation of a detailed profile of the exact chemical composition of a sample. 


By these means, it is possible to differentially analyse proteins from different biological 
processes or disease states in order to discover candidate biomarkers. Many biomarkers used 
in existing clinical practice are assays to quantify proteins, and proteomics techniques such as 
2D-DIGE can be used to separate non-digested proteins within a biological sample based upon 
either apparent molecular mass (by gel electrophoresis) or charge (via isoelectric focusing). 
Such strategies thereby provide a measure of protein abundance and enable the identification 
of isoforms and post-translational modifications [21]. Validation of such potential biomarkers 
can be performed using a common protein expression method such as Western blotting 
and/or antibody-based assays. 


As shown in the workflow illustrated in Figure 2, biomarker discovery can be facilitated by 
using a strategy combining two or more of the above approaches, for example 


° High-throughput/candidate approach. This strategy exploits the benefits of both the 
techniques by filtering the high-throughput data beforehand using a candidate list or a 
functional interactome map. This provides better applicability to the disease/treatment 
response but, considering the large amount of data generated by the high-throughput 
method, is very labour-intensive. 


° Multiple -omics approach. This is a highly demanding method based on the simultaneous 
use of genomics, transcriptomics, proteomics, etc., to analyse the interactome and define 
interactome and functional pathways. If performed on the same individual at different 
times, or disease or treatment stage, such analyses are able to monitor changes in an 
individual's -omic profile, thereby lending themselves to the development of personalised 
medicine strategies. 


The benefit of multiple -omics approaches has been clearly demonstrated by Finkel 
et al., in their recent ‘BforSMA’ cross-sectional study aimed at identifying novel 
biomarkers in spinal muscular atrophy (SMA, OMIM: 253300). SMA is a neuro- 
degenerative motor neuron disorder caused by homozygous/compound hetero- 
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zygous mutations in the motor neuron 1 (SMN1) gene [22]. It is characterised by 
the degeneration of the anterior horn cells of the spinal cord and leads to sym- 
metrical muscle weakness and atrophy. The SMN protein plays a crucial role in 
RNA biosynthesis in all tissues, forming a large, multiprotein complex that drives 
the assembly of small nuclear ribonucleoproteins (snRNPs) of the spliceosomes. 
Through functions in RNP assembly, the SMN complex is required for the 
expression of essentially all protein-coding genes [23]. Preliminary results from 
the ‘BforSMA’ project—based on proteomics, metabolomics, and transcriptomics 
discovery platforms—indicate the discovery of a total of 200 candidate biomark- 
ers, including 97 plasma proteins, 59 plasma metabolites, and 44 urine metabolites 
that could potentially be used to address clinical trial design and identify novel 
therapeutic targets in SMA [22]. 


BIOMARKER 
DISCOVERY 


D 


VALID 


« 2D-DIGE 
Mass spectromet 


Proteomics 


Figure 2. Flowchart of biomarker discovery. Different technologies from genomic, transcriptomic, and proteomic levels 
are able to detect potential biomarkers; subsequently the connection between the three approaches and validation may 
confirm the biomarker identification. 


4. Molecular biomarkers 


4.1. Genomic biomarkers 


New molecular biomarkers could be detected at different levels. According to the Food and 
Drug Administration/EMA definition, genomic biomarkers include both DNA and RNA 
determinants, and genomic biomarkers therefore include DNA methylation status and 
sequence variations, such as single-nucleotide polymorphisms (SNPs), insertions, deletions, 
translocations, CNV, as well as RNA alterations such as differential gene expression and 
miRNAs (Figure 3). The current research focus has shifted somewhat, from SNP to haplotype 
analysis, which it is hoped will furnish useful disease, prognostic, or predictive biomarkers. 
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Figure 3. Schematic representation of the main features (applications, troubles, and advantages) of biomarker types, in 
terms of stability, specificity, repeatability, and accessibility. 


Indeed, DMD patients, for example, despite having common features such as the absence of 
dystrophin in the striated muscles, show different rates of disease progression, especially in 
terms of the age of loss of ambulation. This supports the idea that genetic modifiers exist and 
can influence both the phenotype and the clinical severity of the disease. To this end, Flanigan 
et al. identified SNPs located within the LTBP4 gene, which encodes for the latent transforming 
growth factor (TGF) b binding protein (LTBP), in more than 200 patients, showing that 
individuals homozygous for the IAAM LTBP4 haplotype remained ambulatory significantly 
longer than those heterozygous or homozygous for the VITT haplotype [24]. Furthermore, in 
long QT syndrome (LQTS) —a rare hereditary cardiac disorder characterised by a prolongation 
of the QT interval due to mutations in genes encoding ion channels responsible for the 
generation of electrical impulses—it appears that the haplotype group C-G-T of the heat shock 
protein HSP-70 gene is strongly related to the disease condition and may therefore represent 
a diagnostic biomarker [25]. 


An example of potential RNA biomarkers has been provided in a study by Harten et al. into 
Hutchinson-Gilford progeria syndrome (HGPS, OMIM: 176670). This is a rare, fatal, autosomal 
dominant premature-aging disease (prevalence: <1/1,000,000) caused by splicing mutations in 
the LMNA gene that creates cryptic splice sites and leads to the production of progerin, a toxic, 
permanently farnesylated splicing variant [26]. In their study, the authors analysed the 
expression profile of several matrix metalloproteinases, identifying a donor-age-dependent 
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reduction in the expression of MMP-3 mRNA in HGPS primary dermal fibroblast cultures, 
suggesting that a fall in MMP-3 correlates with disease severity in vivo [26]. 


RNAseg can be used in conjunction with new technologies such as NGS to analyse the whole 
transcriptome both quantitatively and qualitatively and thereby provide information about 
alterations in gene expression. This approach can potentially speed up the process of genomic 
biomarker discovery and was used to good effect in a recent study aimed at tracing a detailed 
RNA profile in both collagen VI myopathy (ColVI) patients and an animal model of the same. 
Collagen VI myopathies are genetic disorders arising from mutations in the collagen VI genes; 
they range from the severe Ullrich congenital muscular dystrophy (UCMD, OMIM: 254090, 
prevalence: 1-9/1,000,000) to the milder Bethlem myopathy (BTHLM1, OMIM: 158810, 
prevalence: <1/1,000,000), which can both be inherited via both dominant and recessive models. 
Generally speaking, neither the type of mutation nor the effect of the mutation on the protein 
structure/function allows precise discrimination between two phenotypes. However, by a 
combined RNAseq approach, the authors identified the potential involvement of circadian 
genes, reporting a marked deregulation of the CLOCK gene in UCMD patients alone, sug- 
gesting it as a candidate biomarker of disease severity in ColVI [27]. 


miRNAs also make quite appealing biomarkers, and a recent study by Eisenberg et al. found 
that the levels of muscle-specific miRNAs (myomirs) are correlated with disease severity in 
several muscular dystrophies, including limb girdle and Duchenne/Becker muscular dystro- 
phies [28]. miRNA studies have also been extended to other RDs, such as cystic fibrosis (CF, 
OMIM: 219700). This is a recessive genetic disorder (prevalence: 1-9/100,000) characterised by 
eccrine gland dysfunction, chronic obstructive lung disease, and exocrine pancreatic dysfunc- 
tion. It is caused by mutations in the cystic fibrosis conductance regulator gene (CFTR), and it 
appears that miR-494 and miR-145 are significantly over-expressed in CF tissues with respect 
to those of healthy individuals, suggesting their role as disease biomarkers [29]. 


As mentioned above, genomic biomarkers also include epigenomics modifications such as 
DNA methylation. Recent studies on Friedreich ataxia (FRDA, OMIM 229300), the most 
common ataxia, which is caused by an expanded GAA repeat in the first intron of FXN, have 
demonstrated that hypermethylation of the gene region upstream of the expanded GAA repeat 
correlates with clinical severity, while hypomethylation of the downstream region correlates 
with the age at onset [30]. It is evident, therefore, that genomic biomarkers may have a wide 
spectrum of functions as clinical and research outcome measures. 


4.2. Proteomic biomarkers 


Proteomic studies have several advantages over genomic analysis, not least the potential 
identification of biomarkers more closely related to biological function/dysfunction. Further- 
more, proteomic biomarkers are more readily accessible than genomic biomarkers, being 
detectable in body fluids such as blood and urine (Figure 3). This makes them potentially useful 
in clinical trials as early indicators of the disease condition, disease progression, or treatment 
effects (drug response or adverse effects). 
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As an example, Martell et al. have provided a clear indication of biomarker accessibility and 
utility in Morquio A syndrome, also named mucopolysaccharidosis IVA (MPS, OMIM: 253000, 
prevalence: 1-9/1,000,000). This recessive lysosomal storage disorder is caused by a mutation 
in N-acetylgalactosamine-6-sulfatase gene (GALNS), which codes for keratan sulphate and 
chondroitin-6-sulphate. The mutation results in a wide spectrum of clinical features involving 
skeletal, cardiac, pulmonary, corneal, and hearing impairment, and the identification of 
biomarkers able to monitor the response to enzyme replacement therapy during clinical trials 
is long past due. To this end, the authors measured the plasma levels of 88 candidate proteins, 
finding that three of them (alpha-1-antitrypsin, lipoprotein a, and serum amyloid P) may be 
suitable surrogate end-points for clinical trials [31]. 


The main advantage of techniques that can assess biomarkers in body fluids is, of course, their 
lack of invasiveness. In this regard, a new protein technology, the SOMA scan assay—an 
aptamer-based method able to recognise specific protein epitopes — has been used to evaluate 
protein levels in the sera of DMD patients. By using this technology to compare serum samples 
from two independent DMD cohorts with healthy individuals, 44 serum biomarkers were 
identified [32]. Similarly, Auray-Blais et al. have recently applied novel MS-based high- 
throughput technologies to protein biomarker discovery in the urine samples of patients 
affected by Fabry disease, succeeding in identifying the lyso-Gb3/related analogue profile as 
a diagnostic biomarker [33]. 


Low invasiveness is also a feature of the most commonly used method of measuring and 
validating protein biomarkers, the immunoassay. Immunoassays are based on the ability of 
monoclonal antibodies to capture and detect specific protein domains and enable the simul- 
taneous investigation of several proteins using very low amounts of samples. For example, in 
idiopathic pulmonary fibrosis (IPF, OMIM 178500), a rare lethal lung disease (prevalence: 1- 
5/10,000) of unknown aetiology and variable and unpredictable course, a multiplexed assay 
has been used to simultaneously evaluate 92 proteins in plasma samples from more than 200 
patients. By these means, three biomarkers predictive of IPF outcome were identified [34]. 
Other studies have used the ELISA immunoassay to evaluate serum levels of an extracellular 
matrix glycoprotein, tenascin-C (TN-C), in Emery-Dreifuss muscular dystrophy (EMD, OMIM 
310300), a rare neuromuscular disorder (1-9/1,000,000) characterised by muscular weakness 
and atrophy, with early joint contractures and cardiomyopathy, finding an association 
between elevated circulating TN-C levels and an increased risk of developing dilated cardio- 
myopathy [35]. 


Due to the low invasiveness of the methods involved, proteomic biomarkers are also very 
appealing as surrogate end-points in clinical trials and/or screening (e.g. neonatal testing). 


4.3. Other biomarkers 


As mentioned earlier, imaging technologies, and indeed any diagnostic test that is able to 
measure the disease status in patients, are useful for measuring, and therefore for investigating 
certain biomarkers. Magnetic resonance imaging (MRI), for example, is a safe and non-invasive 
method of analysing muscle, connective tissue, fat, and bone. Indeed, Kinali and co-workers 
have demonstrated that the MRI scan, focused on particular muscles, can serve as a biomarker 
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for disease progression in Duchenne muscular dystrophy (DMD, OMIM: 310200), a rare 
neuromuscular disease (affecting 1/3300 male births) characterised by rapidly progressive 
muscle weakness and wasting due to degeneration of skeletal, smooth, and cardiac muscles. 
MRI can be used to accurately identify which type of muscles is sufficiently preserved in DMD, 
making it a reliable tool for use in clinical trials. Similarly, MRI scans of muscle biopsies are 
currently being used to correlate the clinical features of muscle diseases with the structure and 
morphology of muscle fibres [36]. 


Neurophysiological measurements can also be exploited as imaging biomarkers. For instance, 
Vucic and colleagues have reported that transcranial magnetic stimulation (TMS) is a useful 
and non-invasive method of assessing the functional integrity of the motor cortex and its 
corticomotoneuronal projections in ALS. Despite their similarities, TMS was able to reliably 
distinguish between ALS and similar peripheral disorders, thereby demonstrating its potential 
diagnostic utility [37]. 


In fact, imaging biomarkers are generally considered very appealing, generating a large 
amount of intensive research in recent years. The ultimate aim of such research is the devel- 
opment of innovative methods of using imaging tools for the detection and monitoring of the 
signs and symptoms of RDs. 


5. Applications and clinical translation of biomarkers 


5.1. Diagnostic/prognostic biomarkers 


A diagnostic, or prognostic, biomarker is one that identifies a disease or quantifies its patho- 
genic factors (Figure 4). Essentially, they are signatures that divide the population into healthy 
and diseased individuals, but in some cases they can finely stratify the disease phenotype into 
different degrees of severity or sub-phenotypes. The routine diagnostic markers classically 
used in clinical practice are temperature, blood pressure, and cholesterol levels, among others, 
whereas in genetic diseases, according to the IRDiRC statement [4], all gene mutations known 
to cause a Mendelian disease have to be considered their primary genetic biomarkers. For 
example, DMD, the most common fatal genetic disorder diagnosed during early childhood, 
arises through mutations in the causative dystrophin (DMD) gene, which are therefore 
considered disease biomarkers, and can accordingly be used to select patients for enrolment 
in clinical trials [38]. 


In some cases, mutations in causative genes can be considered biomarkers of disease severity. 
This is the case in fragile X syndrome (FXS, OMIM: 300624), a rare intellectual disability 
disorder with an estimated prevalence about 1 in 2500 to 5000 men and 1 in 4000 to 6000 women. 
FXS is caused by an expanded CGG triple-repeat located within the 5' UTR of the FMR1 gene. 
The triplet expansion variability defines four different phenotypes, ranging from healthy to a 
severe phenotype, and can therefore be used to distinguish between them [39]. 


In ALS (OMIM 105400), the situation is less clear cut. ALS is a devastating neurodegenerative 
disease with an incidence of 1/50,000 per year. Although several mutated genes have been 
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identified in ALS (DCTN1, OMIM 601143; PRPH, OMIM 170710; SOD1 OMIM 147450; NEFH 
OMIM 162230), the vast majority of patients do not show a defined genetic defect. This would 
seem to indicate that the causative gene is still missing [40], and research in this area has 
therefore focused on the discovery of specific biomarkers able to assist clinical diagnosis and 
monitor the disease progression. In this regard, Hwang et al. have correlated an increased level 
of HMGBI, non-histone architectural protein, in serum samples with the onset of ALS, even 
in early stages of the disease. This increased level of HMGB1 could also be useful as a severity 
biomarker, since they also found higher HMGB1 levels in patients with a severe disease status 
[41]. Moreover, the same group has recently correlated a reduction in the protein level of LG72 
gene, activator of D-amino acid oxidase, to the pathogenesis of ALS [42]. 


POPULATION AFFECTED PATIENTS RESPONDERS 


yee 


Figure 4. Schematic representation of biomarker application. Biomarker may identify, within a population, the individ- 
uals affected by a specific disease and then select patients able to respond to treatment/intervention. 


Another example of a diagnostic biomarker has been found in Alexander disease (ALXDRD, 
OMIM: 203450), a very rare neurodegenerative disorder (incidence of 1/2.7 million per year) 
characterised by varying degrees of macrocephaly, spasticity, ataxia, and seizures. It ultimately 
leads to psychomotor regression and death, and causative mutations have been identified in 
Glial Fibrillary Acidic Protein (GFAP), the major intermediate filament protein of astrocytes, 
which result in toxic accumulation of the protein. Animal model studies have demonstrated 
that transactivation of the GFAP promoter is an early indicator of the disease process, and that 
GFAP level in the CSF could be a potential biomarker in human patients [43]. 


Biomarkers used in clinical practice to improve disease progression monitoring or disease-risk 
prediction are defined as prognostic. Simply put, a prognostic biomarker provides information 
on the course of a disease in an untreated individual, and an example has been identified for 
Marfan syndrome (MFS, OMIM: 154700), a systemic disease of the connective tissue charac- 
terised by a wide spectrum of cardiovascular, skeletal muscular, ophthalmic, and pulmonary 
manifestations. With an estimated prevalence of around 1/5000, patients affected by MFS suffer 
from an increased risk of cardiovascular complications that lead to premature death, and a 
correlation has been demonstrated between the larger aortic root diameters, coupled to a faster 
aortic root growth, and high serum levels of transforming growth factor-ß (TGF-ß). Increasing 
levels of TGF-ß predict cardiovascular events and thereby possesses significant prognostic 
value [44]. 


Another biomarker for cardiac muscular involvement has been found in Fabry disease (FD, 
OMIM: 301500), a rare systematic disease (prevalence 1-5/10 000) characterised by the 
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accumulation of globotriaosylceramide in the plasma and cellular lysosomes of vessels, nerves, 
tissues, and organs throughout the body. This accumulation leads to progressive skin lesions, 
renal failure, cardiac and cerebrovascular involvement, and peripheral neuropathy. Continu- 
ously elevated cardiac troponin I (cTNI), a laboratory parameter well known to reflect acute 
and chronic cardiac muscle damage, has been demonstrated in a substantial proportion of 
patients with FD, suggesting that raised cTNI levels could be a useful laboratory marker for 
assessing myocardial damage in FD [45]. 


Finally, a recent study on DMD has indicated the matrix metalloproteinase-9 (MMP-9) as both 
a diagnostic and prognostic biomarker. Indeed, DMD patients showed a higher serum level 
of MMP-9 protein and tissue inhibitors of metalloproteinase-1 (TIMP-1) proteins with respect 
to controls, with MMP-9 levels being even higher in older, non-ambulant patients than in 
ambulant patients [46]. 


5.2. Predictive/therapeutic biomarker 


Considering the heterogeneous nature of RDs, not all patients are expected to benefit from a 
newly available treatment. Hence the identification of a sub-group of patients likely to respond 
to a novel treatment is important both in terms of health, and in terms of cost-effectiveness [12]. 
To this end, a predictive, or therapeutic, marker must be able to discriminate between drug 
responders (patients gaining benefit from the therapy) and poor/low responders (Figure 4). 
Predictive biomarkers will therefore enable the most appropriate and efficacious treatments 
or interventions to be selected for each patient, thereby underpinning a personalised approach 
to treatment. 


There are a few examples of therapeutic biomarkers useful in RDs, generally SNPs, as in typical 
pharmacogenetics, although some protein studies have also been reported. For instance, a 
pharmacological predictive biomarker has been reported in idiopathic nephrotic syndrome, a 
RD affecting the kidneys. Specifically, Wen et al. [47] found a significant difference in the serum 
proteome of steroid-sensitive nephrotic syndrome (SSNS) and steroid-resistant nephrotic 
syndrome (SRNS, OMIM 256370) patients, predictive of their respective responses to treat- 
ment. 


Another example of a predictive biomarker has been found for Gaucher disease (GD, OMIM: 
230800), a rare recessive genetic disorder (approximate prevalence 1/100,000) caused by 
mutations in the GBA gene, which codes for a lysosomal enzyme, glucocerebrosidase. 
Although the clinical manifestations of this disease are extremely variable, ranging from non- 
neurological manifestations such as organomegaly, bone anomalies, and cytopenia to acute 
neurological forms, a recent time-course analysis of ferritin, chitotriosidase, haemoglobin, and 
platelets showed that the levels of these biomarkers undergo variation during the course of 
enzyme replacement therapy [48]. 


Sometimes the same biomarker can be useful in multiple scenarios and, for example TGF-c, in 
addition to serving a prognostic function in Marfan syndrome, could feasibly be used as a 
therapeutic biomarker in the same condition. Indeed, in a recent study, patients who respond- 
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ed to losartan used to reduce the aortic root dilatation rate, had higher baseline TGF-ß levels 
but exhibited lower plasma TGF-f concentrations during losartan therapy [49]. 


Predictive biomarkers such as these are likely to play an increasingly important role in clinical 
practice, since evaluating the efficacy of a treatment/intervention is fundamental to making 
decisions about treatment choices, and therefore determining therapy outcomes. 


6. Conclusions 


Since the definition of biomarkers in 2001, their importance in clinical and research settings 
has increased dramatically due to their diagnostic/prognostic functions and their ability to 
monitor/predict disease stage, treatment response, and/or adverse effects. Indeed, the creation 
of an exhaustive catalogue of approved biomarkers may be the single most important inno- 
vation in healthcare, bringing considerable clinical and economic benefits. Although current 
research, both academic and corporate, is heavily focused on the development of drugs and 
companion diagnostic tests, in the future, biomarker discovery and development will be vital 
for tailoring medical care to individual patients. This will be especially important in the field 
of RDs, in which the discovery of efficacious biomarkers is likely to greatly facilitate the process 
of EMA approval and development of novel orphan drugs. In addition to being both and time 
and cost-effective, biomarker research also provides exciting opportunities to expand our 
knowledge of the physiopathological mechanisms behind rare and other diseases, helping to 
discriminate between distinct disease presentations and comorbidities, as well as predict the 
different impacts of concomitant medication, and various important demographic parameters 
such as gender, age, and ethnicity. In short, biomarker discovery represents a giant leap 
towards the ultimate goal of truly personalised medicine. 
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